Skip to content
This repository has been archived by the owner on Oct 30, 2024. It is now read-only.

feat: otto knowledge refactoring #159

Merged
merged 8 commits into from
Oct 29, 2024
Merged

feat: otto knowledge refactoring #159

merged 8 commits into from
Oct 29, 2024

Conversation

iwilltry42
Copy link
Collaborator

@iwilltry42 iwilltry42 commented Oct 28, 2024

  • add: load.gpt tool as a wrapper around knowledge load to transform an input file into Knowledge JSON format and store it to another file.
  • change: ingest and load output final error (if any) as JSON to stdout (and exit with 0) or don't print anything at all. All other logging should go to stderr (slog does that by default, so we're good)
    $ knowledge load ~/Downloads/Tags.xlsx out.json
    {"error": "unsupported file type \"/home/thklein/Downloads/Tags.xlsx\""}
    
  • add: --metadata foo=bar for ingest and load commands (also usable via KNOWLEDGE="foo=bar,baz=bom" in env)
  • change: ingest and load both use the gptscript SDK to read from and write to the workspace if paths are prefixed witth ws:// (to keep knowledge usable as a standalone tool, making it easier to test and evaluate)

@iwilltry42 iwilltry42 marked this pull request as ready for review October 28, 2024 19:08
Copy link
Member

@g-linville g-linville left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only other suggestion is to replace ws:// with workspace:// since ws:// is ordinarily used for WebSocket URLs.

Comment on lines +125 to +128
if strings.HasPrefix(paths[0], "ws://") {
if len(paths) > 1 {
return 0, 0, fmt.Errorf("cannot ingest multiple paths from workspace")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would not catch the case where the user passes a normal path as the first parameter, and a ws:// path as the second.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think path will be "director:///" if it is local, and s3 and r2 s3:///" r2:///`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we're only checking the very first path for the ws:// prefix. If it's on the second item in the paths slice, it won't get caught.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ws:// is overloaded, but fairly clear with regards to knowledge (which is file-based) I guess 🤷‍♂️

Anyway, those checks are "superficial", as only Otto should be using this stuff.

But you're right and I will fix the checks later on 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm willing to accept these shortcomings (that it only checks the first path and that ws is overloaded).

@g-linville @StrongMonkey if this is your only issue with the PR, can your finish your review?

if err != nil {
return err
exitErr0(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have to do this because otherwise cobra will return a non-zero if you bubble an err all the way up?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup

pkg/client/standalone.go Outdated Show resolved Hide resolved
default: true
globals:
ingestion:
textsplitter:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean you are just using textsplitter until you write the "enhanced" splitter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - but the new one will land today

Copy link
Contributor

@cjellick cjellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left two comments, but they are just non-blocking questions

load.gpt Outdated Show resolved Hide resolved
Co-authored-by: Darren Shepherd <darren@acorn.io>
@iwilltry42 iwilltry42 merged commit bf3fde2 into main Oct 29, 2024
1 check passed
@iwilltry42 iwilltry42 deleted the change/otto branch October 29, 2024 05:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants